Skip to content

Conversation

HAIAI
Copy link
Collaborator

@HAIAI HAIAI commented Sep 26, 2025

Purpose

rebase/sync branch to latest main

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

njhill and others added 30 commits September 19, 2025 16:34
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: boyuanfeng <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…speed (#23558)

Signed-off-by: Manoel Marques <[email protected]>
Signed-off-by: Manoel Marques <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request appears to be a large rebase of a development branch, introducing a wide array of changes across the codebase. Key updates include a major refactoring of the CI/CD pipelines, deprecation of old benchmark scripts in favor of a new CLI, and the addition of numerous new benchmarks. On the feature side, there's new support for FP8 on ROCm, various new fused kernels for performance, and significant improvements to the CPU backend with oneDNN and scalar fallbacks. I've identified two critical correctness issues related to potential out-of-bounds memory access in MoE kernels, which I've detailed in the comments below. The rest of the changes, including new features and extensive refactoring, appear solid.

Comment on lines 46 to +49
int expert_id = topk_ids[i];
if (expert_id >= num_experts) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This bounds check on expert_id is a critical fix. Without it, an invalid ID from topk_ids could lead to an out-of-bounds atomic add on shared_counts, potentially causing memory corruption or a crash. This is an important safeguard for correctness and stability.

Comment on lines 103 to +109
int32_t expert_id = topk_ids[i];
if (expert_id >= num_experts) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Adding this bounds check for expert_id is crucial for correctness and security. It prevents potential out-of-bounds writes to cumsum_buffer and sorted_token_ids if topk_ids contains an invalid expert ID, which could otherwise lead to memory corruption or crashes.

if ctype is None:
ctype = {"jpg": "image/jpg", "png": "image/png"}[ext]
self.send_response(200)
self.send_header("Content-Type", ctype)

Check warning

Code scanning / CodeQL

HTTP Response Splitting Medium test

This HTTP header is constructed from a
user-provided value
.

Copilot Autofix

AI 4 days ago

To fix the potential HTTP response splitting vulnerability, sanitize or strictly validate any user-derived input used for constructing HTTP header values. Here, before using ctype as a header value, we should ensure that it contains no CR, LF, or colon characters that could allow header injection. We can do this by stripping or replacing these characters if present. It's also prudent to be defensive, as a belt-and-braces approach; even though in this code only jpg/png extensions are accepted, we should sanitize just in case. The fix will:

  • Strip/disallow \r, \n, and : characters from ctype before it is used in send_header.
  • Do this immediately before using ctype on line 1173.

No new methods or imports are necessary, as simple string methods suffice for this sanitization.


Suggested changeset 1
tests/conftest.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/conftest.py b/tests/conftest.py
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -1170,7 +1170,8 @@
         if ctype is None:
             ctype = {"jpg": "image/jpg", "png": "image/png"}[ext]
         self.send_response(200)
-        self.send_header("Content-Type", ctype)
+        safe_ctype = ctype.replace("\n", "").replace("\r", "").replace(":", "")
+        self.send_header("Content-Type", safe_ctype)
         self.send_header("Content-Length", str(len(data)))
         self.end_headers()
         self.wfile.write(data)
EOF
@@ -1170,7 +1170,8 @@
if ctype is None:
ctype = {"jpg": "image/jpg", "png": "image/png"}[ext]
self.send_response(200)
self.send_header("Content-Type", ctype)
safe_ctype = ctype.replace("\n", "").replace("\r", "").replace(":", "")
self.send_header("Content-Type", safe_ctype)
self.send_header("Content-Length", str(len(data)))
self.end_headers()
self.wfile.write(data)
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what is this pr for? Do we need to land it? Please give more context in the description

@github-project-automation github-project-automation bot moved this from To Triage to In progress in gpt-oss Issues & Enhancements Sep 26, 2025
@HAIAI
Copy link
Collaborator Author

HAIAI commented Sep 26, 2025

@yewentao256 try to land it.

Copy link
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't realize that this is from main to dev, not dev to main.

@github-project-automation github-project-automation bot moved this from In progress to Ready in gpt-oss Issues & Enhancements Sep 26, 2025
@HAIAI HAIAI added the ready-for-merge Indicate this PR is ready to be merged by the maintainers, used by reviewers without merge access. label Sep 26, 2025
@hmellor hmellor merged commit aee7633 into amd_dev Sep 26, 2025
17 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models kv-connector llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues qwen Related to Qwen models ready-for-merge Indicate this PR is ready to be merged by the maintainers, used by reviewers without merge access. rocm Related to AMD ROCm speculative-decoding structured-output tool-calling tpu Related to Google TPUs v1
Projects
Status: Done
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.